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Abstract — Nonparametric density estimation is considered for a discretely observed stationary 
continuous-time process. For each of three given time samphng procedures either random or 
deterministic, we estabhsh that histograms and frequency polygons can reach the same optimal 
L2-rates as in the independent and identically distributed case. Moreover, thanks to a suitable 
"high frequency" sampling design, these rates are derived together with a minimized time of 
observation depending on the regularity of sample paths. 
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1 Introduction 

Consider a M"^- valued process {Xt, t G M}, d > 1, where all Xts have the same unknown marginal 
density /. The aim of this paper is to study the rates of some nonparametric piecewise linear 
estimators of / when the process is discretely sampled in time at t = ti, . . . , t„. During the past 
three decades, the problem of density estimation for continuous-time observations has been a 
subject of continued interest in the statistical literature. Especially, it was shown by Castellana 



and Leadbetter that if a continuous-time process, observed over the time interval [0,T], 
has enough irregular sample paths, then nonparametric estimators can achieve a mean-square 
parametric rate of convergence l/T. An account of the research in this field may be found e.g. 
in two complementary monographs by Bosq and Blanke ^ and Kutoyants 19|, and in Lejeune 
(23I for the particular case of piecewise linear estimators. 



In practice, however, the whole sample path is not always perfectly observable over a given 
time period - either due to technical reasons or unavailability of data at all time points. Indeed, 
most of physical phenomenons usually represented by curves generate rather discrete observed 
values or interpolated ones. Hence it seems more natural to plan an estimation approach based 
upon n discrete values of the process collected with a suitable time sampling procedure. In 
the context of nonparametric density estimation, the three sampling procedures considered in 



the present work have been investigated by Masry [2J], Prakasa Rao [25|], Wu [30|] (random 
sampling), Bosq 0,0] and Blanke and Pumo [3| ("high frequency" periodic sampling), among 
others. As far as we know, most of existing papers - including those cited above - only deal with 
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kernel estimation, and none have yet focused a special attention on piecewise linear estimators. 
In that framework, we are then interested in the rates of histogram and frequency polygon 
estimators with respect to the mean integrated squared error (MISE) criterion. Here, the chosen 
histogram-based density estimators have the desirable property of being quickly computed and 
updated. This is therefore a clear advantage for many applications where typically one has to 
handle large amounts of data in real time. It is noteworthy that elementary estimators may also 
be efficient from a theoretical viewpoint. Thus and despite its high simplicity, the frequency 
polygon - defined in dimension one as the linear interpolant of the mid-points of an equally 
spaced histogram - is known to be as good as some more sophisticated density estimators in 
terms of MISE (see Scott Q). 

In this paper, we will show that, under mild conditions, these estimators built with sampled 
data have at least the same optimal rates n"^/^'^"''^^ (histogram) and n~^/^ (univariate frequency 
polygon) as in the independent and identically distributed (i.i.d.) case. First, we will exami- 
ne the case of two classical random sampling designs, which are a relevant way to treat the 
occurrence of low frequency and irregularly spaced measurements. Next, we will investigate a 
deterministic design that applies when the data are observed at high frequency and during a 
long time, as in a variety of domains, including econometrics, meteorology, oceanology and many 
others. Particularly, this sampling design is well-adapted to the continuous-time context since 
the optimal rates can be derived together with a minimized time of observation depending on 
the regularity of sample paths (see Bosq 01). Thanks to this methodology we will furthermore 
address the important issue of finding an optimal sampling strategy. 

The paper is organized as follows. In Section [2] we will review the time sampling procedures 
and define our framework. Section [3] contains the main assumptions and our results relative to 
histograms; the behavior of frequency polygons is then studied in Section [4] and a concluding 
discussion is given in Section [5l Finally, the proofs are postponed until Section [6l 



2 Preliminaries and notations 

Let X'^ = {Xt,0 < t < T} be a measurable M'^-valued, d > 1, continuous-time process on the 
probability space (O,^, P), where the Xj's have a common distribution admitting a density / 
with respect to the Lebesgue measure over W^. We suppose that the joint density f(Xs,Xt) of 
{Xs,Xt) does exist for all s 7^ t such that f(Xs,Xt) — f{Xo,x^t-s\) ~'- f\t-s\^ which is a quite 
weak stationary condition (see e.g. Bosq j3|). We also denote by Qu the function defined for 
all n > as (7„ '■= fu — f ® f where (/ ® f){y,z) = f{y)f{z). Some required asymptotic 
independence conditions on the process (including a-mixing condition) will be added later with 
our assumptions. Our purpose is to estimate the function / from n observations collected up to 
time T by making use of one of the sampling procedures described below. 



2.1 Sampling schemes 

Let = {tfc,0 < A; < n} be a strictly increasing sequence of points in time - or event arrival 
times - such that d = tQ < ti < ■ ■ ■ < tn ='■ Tn and T„ ^ 00 as n ^ 00. If is random, it is 
also assumed that the processes X'^ and are independent and that all Xt^. 's are measurable 
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with respect to the cr-algebra generated by X'^ and . The first two random schemes as defined 



in Masry [2j| are the following. 



Renewal sampling - The set of times for observations = is a renewal type process on 
[0, +cxd[ such that 

fc 

io = and tfc = Tj, 1 < /c < n, 
j=i 

where r" = {r^, 1 < /c < n} is a sequence of positive and i.i.d. random variables - or inter-arrival 
times - generated by a given probability density function g(t) > with finite mean 6. Let g*^ be 
the kth fold convolution of g with itself, then g*^{t) is the density function of tk and we define 
the renewal density h by h{t) := ^^kLid^^it)- Here and below, the function h is supposed to be 
bounded by a constant /iq. 



Remark 1. The renewal density is known to satisfy h{u) 5^^ as — > oo (see Cox p. 55), 
but its explicit expression is generally complicated to obtain. Nevertheless, the boundedness of 
/i is a condition which holds for a large class of sequences r". For the reader convenience, we 
recall the example in Masry jl^l corresponding to the usual situation where r" has a Gamma 
density of type r, i.e., 

with mean S and variance 5'^/r. Thus, if r = 1, h{t) = for t > (=5^" is a Poisson process) 
and, if r = 2, h(t) = 6~^{1 — exp(— 4i/5)) which approaches its limit monotonically as 
t —>■ oo. In both cases, the value ho = is clearly appropriate. Prom r = 3, the condition 
becomes delicate to verify since h(t) oscillates in approaching . The case r = 1 is illustrated 
e.g. in Ait-Sahalia and Mykland with an example of financial data for which a histogram 
distribution of the sampling intervals is fitted by an exponential density. 

Jittered sampling - First, we assume that the process is regularly observed with a period 
6 > 0. This sequence = is then contaminated by an additive noise to model the 
plausible imperfections of a measurement recording system: 

to = Zq and tk = kS + Z^, 1 < k < n, 

where = {Zk,0 < k < n} denotes an i.i.d. random sample from a symmetric probability 
density function gj{z) over [—5/2,6/2]. In comparison with renewal times, jittered times could 
be seen as only partially random due to the deterministic component in t^- 

Remark 2. The observations drawn from each of these two random designs are by definition ir- 
regularly spaced in time, but the "long-term" expected inter-arrival time between two consecutive 
random instants is equal to 5 in each case. 

Finally, we introduce a periodic scheme examined in Bosq [3] for kernel density estimation 
where the sampling step (5„ is n-decreasing in a deterministic manner. 
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High frequency sampling - In order to represent the occurrence of high frequency observations 
during a long time, the sampling instants in = are defined periodically as 

io,n = and = kbn^ 1 < k < n, 

where Sn > and — ^ O"*", T„ = n5„ ^ oo as n — > oo. In the sequel, we will give minimal 
thresholds (5* over which our estimators converge with the optimal rates of the i.i.d. case. The 
knowledge of will also help us to minimize the costs of estimation without altering the rates. 
To explain, observe that two situations may occur in applications. First, if the total time of 
observation is a given and large enough Tn, the value of a minimal <5* allows to select a maximal 
number n* of points in [0,T„] to estimate /. On the other hand, consider that a maximal 
and large enough sample size n is available, then we can deduce from 5* a minimal sufficient 
time T* = n6^ of observation (see Blanke and Pumo (B]). Furthermore, we will emphasize 
the convenience of such a framework to sample a continuous-time process. Thus, under the 
Castellana-Leadbetter's conditions, i.e. sup^ y\gu{x,y)\du < oo and gu{-,-) is continuous at 
for any n > 0, Bosq [3| proved that Sn can be chosen in order to obtain the full rate 1/T„ 
of the pointwise mean squared error of kernel estimators. In that situation, the sampling scheme 
is said to be admissible. Concerning admissible sampling in nonparametric density estimation, 
let us cite relevant works by Leblanc [20|] for wavelets estimators, by Biau [3| for spatial kernel 
estimators, and by Comte and Merlevede [l^l and Blanke [1], respectively, for projection and 
adaptive kernel estimators. 

2.2 Mean integrated squared error 

The global accuracy of density estimators can be measured by the mean integrated squared error 
which is the expected squared distance between a density estimator /„ and the true density / 
integrated over M.'^: 

MISE(/„) = E / (/„(x) - f{x)Ydx. 
It is also the sum of the integrated squared bias (ISB) and the integrated variance (IV): 

ISB(/„)=/ (e(A(x)) -/(x))'dx and W{fn) = [ E (/„(x) - E(/„(x)))' dx. 

Let us fix the following usual notations: C^iW^) denotes the set of /c-times continuously 
differentiable functions and (M'^) the set of functions with integrable kth power over M'^ such 
that||/|U = (4,|/(x)|Mx)iA. 

3 Histogram 

We primarily examine the histogram, which is the oldest and most popular nonparametric es- 
timator. Because of its simplicity, histogram is still widely used in presentation and practice 
by statisticians. The theoretical properties have been also extensively studied in the i.i.d. case 
and we may refer e.g. to Scott [281] (Chapter 3) for a background material. For continuous-time 
delivered observations, both optimal and full rates of MISE and asymptotic normality under 
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Castellana-Leadbetter's conditions are given in Lejeune [23|1- In this section we derive results for 
observations collected at discretized instants according to the sequences i = 1,2,3. 



3.1 Definitions and assumptions 

Prior to the definition of our estimator, we introduce a partition of M'^, say Tin, into hypercubes 
of volume /i^ such that — > 0"*", nh^ — > oo as n — > oo: 



and 



d 



fc=i fc=i 

pd 



+ Y 



j = {h,---,jd)' e 



where hj = {hj^, . . . ,hj^)' G bj^+i - bj^ = hn and Cj^ = {bj^ + bj^+i)/2. Here hn is the 
smoothing parameter commonly referred to as the bin width. Note that the extension to unequal 
bin sizes is straightforward with more notations. Prom now on, we will suppose for any x G M'^ 
the existence of an index j{x,n) in 1/^ such that x G ''^j{x,n) (=• T^nj)- 

Given n„ and n discretized observations Xt^ , • • • , -^t„ , the histogram estimator of / is then 
defined as 



1 " 



where Itt^^ denotes the indicator function of vr^j. In particular, has a unique value, denoted 
by /j, over each hypercube 7r„j of n„, which explains its high computational advantage. 

Let A and B be two sub-o"-algebras of J^, we introduce the classical strong mixing coefficient 
defined as 

a{A,B):= sup \P{A^ B) - P{A)P{B)\. 

Let denote (t{X) the cx-algebra of events generated by a random variable X. In the sequel, we 
will use the definition of a 2-a- mixing process {Xt,t G M} given in Bosq as 

{u) := snpa[a{Xt),a{Xt-\-u)) ^0 as it — >■ oo. 



Note that such a condition only for the couples {Xt,Xt+u) is less restrictive than the classical 
one introduced by Rosenblatt |26l |. 

These are now the main assumptions over processes. 

Assumptions Aq 

(i) / G (M'^) so that all the partial derivatives are square Riemann-integrable; 

(ii) / is continuous and ||/||oo = supygigd f{y)<oo. 
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Assumptions Ai (with renewal and jittered samplings) 



(i) There exists uq > such that for any u > uq: sup^gjgd \gu{y, z)\ < k{y) with /c(-) a positive, 
continuous and integrable function defined on W^] 

(ii) X'^ is an arithmetically strongly mixing (ASM) process i.e. there exists p > 2, oq > and 
ui > Uq such that for any u > ui: a^^\u) = a[a{XQ),a{Xu)) < oqu^^. 

Assumptions A'^^ (with high frequency sampling) 

(i) There exists 70 > and no > such that for any < n < uq: My G W^, sup^g]g<i /^(y, z) < 
Lp{y)u~""^ with tp{-) a positive, continuous and integrable function defined on M*^; 

(ii) There exists a positive, continuous and integrable function k{-) defined on M'^ such that for 
any u > uq: \ly G M*^, sup^g^d |(7u(y, z)\ < k{y)'7r{u) where 7r(-) is a bounded and ultimately 
decreasing function which satisfies J^7r(n)dn < 00, ui > uq. 

The assumptions above are classical in nonparametric estimation with dependent data. Aq 
displays some constraints of regularity on the true density /. The condition ^o(0 specific 



to the bias treatment, it was previously introduced by Lecoutre [2l|] to study the multivariate 
histogram in the i.i.d. case. 

The following conditions should take into account the local behavior of sample paths as well as 
the properties of asymptotic independence of processes (respectively described with the behavior 
of Qu for u near the origin and for u large). Ai{i) is a mild condition on for intermediate 
values of u. In particular, it slightly weakens the assumption of boundedness on the conditional 



density used by Masry [2J] and Carbon, Garel, and Tran [K 

A'i{i) appears to be less usual in density estimation, but it is a typical condition for the 
continuous-time framework to control the explosive behavior of the joint densities fu{-,-) in a 
neighborhood ofu = 0. Assumptions A[ are in the spirit of those made (and widely discussed) by 
Blanke and Pumo jH]- Here Ai{i) is used with high frequency sampling to obtain optimal rates 
together with a short sampling step 6n depending on a positive known coefficient 79. Roughly 
speaking, the value of 70 is directly linked with the holderian properties of sample paths and 
the dimension d: namely, one has 70 = c?/2 for a wide class of d-dimensional ergodic diffusion 
processes and 70 = d for "smooth" processes (see e.g. Blanke 0| for technical details). 

Other assumptions, namely Ai{ii) and A[{ii), ensure asymptotic independence between vari- 
ables distant in time. Ai (ii) involves a mild version of a-mixing which is well-known to be weaker 
than many dependence structures as (f), /3 or p-mixing (see e.g. Doukhan (3]). Finally, admissible 
high frequency samplings are obtained under A\{ii), a quite typical condition in this context. 



3.2 Rates of convergence 

Using each sampling design defined above, we will now establish the optimal rate of histograms. 
For the sake of readability, some crucial lemmas which provide upper bounds for the variances 
and the covariances of are postponed to the proofs. Let := df /dxi and define the roughness 
R of fl by its squared L2-norm: R{fi) '■= f^d fi{x)'^dx. Since the bias of only depends on 
the bin width and the true unknown density /, and not on the dependence structure of the data, 
we recall the following result given by Lecoutre (2l| with multivariate independent observations. 
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Lemma 3.1. If Assumption ^o(0 satisfied then 

ISB(/„^)=§i?.(/')+oK), 

where Rdif) ■.= Y.tiR{f'd- 

3.2.1 Renewal and jittered samplings 

Let us denote by \x\ the smallest integer not less than the real x. The first part of the next 
theorem gives an asymptotic upper bound for IV. Consequently, from an ad hoc choice of the bin 
width hn which balances both ISB and IV terms, we infer that histograms can reach the same 
optimal rate n"^/^*^^^^ of convergence to / as in the i.i.d. case. 

Theorem 3.2. 1. Under Ao(ii) and Ax and if f^'^/P G {W^) fl (M'^) forKp < p-l, 
then 

hmsupn/i^ IV < 1 + 

ra— >oo 

where C = 2no/io for the renewal sampling and C = 2 j"^] for the jittered sampling; 
2. If in addition ^o(0 holds then the choice hn = cn~^/('^+^\ < c < oo, yields 

limsupn3T2 MISE(/f ) < ^Rd[f') + -^i^ + C), 

ra— >oo J--^ C 

with same constant C . 

Remark 3. If p = p—1 the rates in Theorem [321 remain valid but with larger asymptotic constants 
(see proofs). Thus, if for instance p > 3, one may choose p = 2 provided that /^/^ is continuous 
and integrable. 

3.2.2 High frequency sampling 

The high frequency model is interesting to find some connections between both discrete and 
continuous-time frameworks. Here the period Sn is now a function of the sampling size n so that 
all observations can be as close in time as desired provided n large enough. Within this setup 
we also need to check the local condition A[{i) on the joint density of {Xo,Xu) for the small 
values of u, wherein a (known) coefficient 70 is linked with the regularity of sample paths. In this 
framework the previous optimal rate of order still preserved. Moreover, depending 

on the value 70, we can derive a minimal 5* (more precisely 5^ (70)) and then deduce a minimal 
time of observation of the process T* that ensures this rate. 

Theorem 3.3. According to the value 0/70 we assume that 5n > ^ni^o) defined as 

Kilo) ■■= C^l^ral{70<l} +'^2/iraln(/l-'^)l{T,o=l} +C^3/ira^^"l{70>l}> < dl,d2,d3 < OO. (3.1) 
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1. Then under AqIu) and A[ 

limsupn/i^ IV(/f) < 1 + 

n— >oo 

where C-y^ is a positive constant which depends upon 70 (see its explicit form in proofs); 

2. If in addition AqIi) holds with hn = cn~^^^'^~^'^\ < c < 00, then 

limsupn^ MISE(/f ) < ^Rd{f') + ^(1 + C^,), 



n— >oo 



with same constant Cjg . 



Remark 4. Using Aq with either Ai or A[, our results in Theorems l3.2l and l3.3l are similar to those 



derived with independent variables by Lecoutre [2l|] in the d-dimensional setup. Thus we retrieve 



(in limsup) the same optimal rate n"^/^'^"'"^) in terms of MISE. The additional asymptotic cons- 
tant C or C^Q in the variance bound arises as a non negligible remainder of the covariance term; 
it clearly depends on the sampling design in use. Nevertheless, if 6n is such that <5n/<5^(7o) 00 
as n — > 00, we can remove C^q in Theorem 13.31 to get the exact limiting constant of the i.i.d. 
case with /i„ = cn^-'^/('^+^), < c < 00. 

Remembering that T„ = n6n the rate n"^/^*^^^) may be easily rewritten in terms of r„ 
according to the value of 70. 

Corollary 3.4. Under Aq and A[ the choice hn = cn~^/^^^'^\ < c < 00, leads to 

0{T^^) with 6n = dih^, < di < 00, if 'jo < 1; 

MISE(/'^) = I ^{^n^ InTn) with 5n = ^2/1^ In {K.'^)^ < ^2 < 00, i/70 = 1; 

q/j.- 2^0+^(70-1) \ ^^^^ ^ dshi^"'', < < 00, ?/7o > 1. 



Remark 5. For the special case of irregular paths processes (70 < 1), we thus observe in Corollary 
13.41 a surprising similarity between the best rate of order l/T^ and the 1/T-parametric rate 
encountered in the real continuous-time context. Indeed, the time of observation clearly depends 
on the value of 70 since Tn has to be of order n^/('^+2) (^q < 1)^ n^/^'^"'"^) In n (70 = 1) or 
j^(27o+d(7o-i))/((d+2)7o) (^^^ y gQ obtain same efficiency in estimation. Especially, this 

enlightens the fact that irregular paths processes may be observed less time than more regular 
ones (70 > 1). 

Finally, it may be interesting to indicate the exact limit of the pointwise variance of /^(x) 
in the case 70 < 1. The following proposition is thus obtained as a simple transposition from 
kernel to histogram estimators of a result by Bosq (Proposition 7.1. (i)). 

Proposition 3.1. Let x gM.'^ and assume that 

(i) WduWoo < '7t{u) where (1 + u)tt{u) is integrable over ]0,+oo[ and utt{u) is hounded and 
ultimately decreasing. Furthermore gu{-,') is continuous at {x,x); 
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(ii) sup(j^^^)giB,2d \YlT=i ^ngr5„iy,z) - gu{y,z)du\ as 6n i 0+, 
then 

poo 

lim T„ Var(/^(x)) =2 gu{x,x)du, 

n^oo _/q 

provided that 5„ = o(/i^). 

Remark 6. Prom Kutoyants [l8| . the limiting constant is also the minimax bound for mean 
squared error in the case of ergodic diffusion processes satisfying some conditions of regularity 
on the trend coefficient and the diffusion coefficient (see Veretennikov (29|). 



4 Frequency polygon 

Given a (univariate) histogram, the frequency polygon results from a natural smoothing with 
straight lines to get a continuous estimator. However, the gain of this simple linear smoothing is 
substantial since we immediately improve the weak order /i^ inherent to the bias of histograms. 
The main properties of frequency polygons are gathered in Scott [i^ (Chapter 4) within the 
i.i.d. setup. The mixing case was then treated by Carbon, Garel, and Tran flol|, and recently 
extended to the random fields by Carbon HI]. In continuous-time, Lejeune i23|] established 



both optimal and parametric rates of MISE and asymptotic normality; the extension to the 
random fields is done in a submitted work by Bensaid and Dabo-Niang [3]. For the sake of 
simplicity, we shall confine attention to the real case {d = 1 and 70 < 1). 

For convenience, /' and /" denote the first and second derivatives of / and we define the 
roughness of /" by R{f") := J^r{xfdx. 

4.1 Definition and assumptions 

Based upon n„ and Xt-^ , . . . , Xt^ , the frequency polygon is simply constructed by connecting the 
mid-points of the histogram heights with straight line segments 



In the literature we find also alternative definitions which differ from the way of interpolation 
as e.g. the edge frequency polygon introduced by Jones, Samiuddin and Al-Harbey Maatouk 



17l | or its extended form by Dong and Zheng [15||. All these estimators share the same rates of 



convergence but with different asymptotic constants. 

In agreement with assumptions Ai and A'^ in the previous section we will describe the proper- 
ties of the frequency polygon under the following conditions on /. 

Assumptions Aq 

(i) / G C2(M), /" € Li(M) and /, /" G L2(M); 

(ii) \f"{x) - f"{y)\ < lo\x - y\\ lo>0,u g]0, 1], for {x,y) G M^. 

(iii) / is continuous and ||/||oo < 00. 
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4.2 Rates of convergence 



The ISB contribution is given in Scott [27 1 



Lemma 4.1. If Assumptions AQ{i){n) are satisfied then 

isB(/r) = ^i?(/")^^+«K)- 

Remark 7. The nice order hf^ is much better compared with histograms and famihar for more 
sophisticated density estimators as kernel estimators. As emphasized earher the bias term does 
not depend on the samphng scheme. 

4.2.1 Renewal and jittered samplings 

Using the analysis on histograms with a new suitable choice of we give the optimal rate of 
frequency polygons. Note that constants C and C^q are unchanged. 

Theorem 4.2. 1. Under ^o(m) and Ai and if f^'^l^ G C^CR) fl /or 1 < p < p - 1, 

then 

limsupn/i„ IV (/^^) < ^ + 
2. If in addition A'Q{i){ii) hold then the choice hn = cn~^l'^ , < c < cxo, yields 
li.™p„lMISE(/r)<^c*ii(/")+i(^c). 

4.2.2 High frequency sampling 

Finally, recovering the local properties of sample paths when data become dense in time, we find 
again the optimal rate while minimizing the time of observation. 

Theorem 4.3. According to the values 0/70, we consider optimal choices (5* (70) given by IIS. 
1. Then under AqIiH) and A[ 



2 



limsupn/i„ IV (/^^) < 3 + G 



^70 1 



2. If in addition A'(^{i){ii) hold with hn = cn < c < 00, then 

hmsupnt MISE(/r) < ^^'^(/") + ^ + C70) • 

Remark 8. In both Theorems 14.21 and 14.31 we exhibit (in limsup) the same n~'^/^-consistency 
obtained in Scott [27|| with i.i.d. observations. The additional asymptotic constant C or C^q still 
stays and relies on the sampling design in use; but, in Theorem 14.31 any choice of 6n satisfying 
(5„/(5*(7o) — > 00 as n — > 00 allows to remove C-y^ to get the exact limiting constant of the i.i.d. 
case with hn = cn~^^^. Finally, note that if we take p = p — 1 in Theorem 14.21 the rates remain 
valid up to a decayed constant. 
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Corollary 4.4. Under A'^ and A[ the choice hn = cn , < c < oo, leads to 



MISE(/^ 



pp\ _ ( 0(Tn ^) with 6n = dihn, < di < oo, if ^0 < 1; 

0(T~^ In T„) with Sn = dihn^n (^hn^) , < (i2 < oo, i/70 = 1. 



Remark 9. As noticed before real irregular paths processes may be observed less time than more 
regular ones since r„ has to be of order n^^^ (70 < 1) or n^/^lnn (70 = 1) to obtain same 
efficiency in estimation. 

For completeness, the exact limit of the pointwise variance of /^^(x) follows straightforwardly 
from Proposition 13.11 (see also Remark [6|). 

Proposition 4.1. Under conditions of Proposition [3J\ with 5n = o{hn), one has 

/•OD 

lim Tn Var(/^^(x)) = 2 / gu{x,x)du, x G M. 

5 Discussion 

In this work we derive the optimal -L2-rates of two computationally advantageous density estima- 
tors in the setup where observations are discretely sampled from a continuous-time process. For 
practical considerations we have studied three time sampling procedures to properly describe the 
time occurrences of the real data. Thus, values may be available at low or high frequency but also 
regularly or irregularly spaced in time. Therefore our main results state that all designs either 
random or deterministic lead to the optimal rates n"^/^'^"^^^ for histograms and {d = 1) 

for frequency polygons, with respect to the MISE convergence, which are those derived in the 
i.i.d. case. From this result, the frequency polygon is a good alternative to more sophisticated 
nonparametric density estimators. Particularly, we have focused on a high frequency sampling 
to reveal some parallels with the idealized continuous-time framework as soon as observations 
are selected close enough to each other. We then use the local properties of sample paths to 
have a consistent estimation with a minimal time of experiment. This fact might be explained 
as follows: irregular sample paths carry much more information than regular ones where the 
correlation between two successive variables is much stronger. Consequently, we infer that more 
the paths are irregular - i.e. when A[{i) holds with 70 < 1 - more the time of observation 
would be shortened with a good behavior of the both estimators. Although not presented here, 
simulations in progress already corroborate our theoretical results in the particular case of two 
stationary real gaussian processes. As awaited the frequency polygon performs well and appears 
much closer to kernel estimator than to histogram. To go further in our investigations, it remains 
to examine the case of non-gaussian processes including, for instance, the cumbersome problem 
of estimating bimodal densities. The important issue of finding optimal choices for the bin width 
value is left for future work. 

6 Proofs 

Throughout this section, we detail the proofs of Theorems 13.21 13.31 14.21 and 14.31 In order to do 
this, some auxiliary lemmas are necessary to derive upper bound expressions for the variance of 
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f^(x), X € TTnj, which will depend on the sampling scheme being used. Let \\X\\q = (E|X|'^)^/'^ 
with 1 < g < CO, then X e Lg{P) means that \\X\\g < oo. We recall the following useful 
covariance inequality as written in Bosq Q] (p- 21). 



Lemma 6.1 (Davydov's inequality). Let X £ Lq[P) and Y £ Lr{P) with q > 1, r > 1 and 



I + i < 1, then 



where 1 + 1 + 1 = 1. 



\Coy{X,Y)\ < 2p 2a{a{X),a{Y)) 



i/p, 



l^llflll'^ll'') 



6.1 Histogram 

6.1.1 Variance bounds with random sampling 

Lemma 6.2 (renewal sampling). If AQ{ii) and Ai hold then we obtain for 1 <p < p — 1: 



nr 



4ff2 2ao i/P/iQ i_i J{{d-s){p-p)-d} 

+ f{S.j) " K , 6.2 

p-p 



with 0<e<dh-^j and [ijAj] e vr: 



r2. 



Lemma 6.3 (jittered sampling). Under the same conditions as in Lemma \6.S\ and 1 < p < 

p-l: 



nhi Var(/,) < f{i,){l - hif{^j)) (l + 2 [^]) + 2k{^j) (h[ 



Uo 

6 



{p-p)6p 



^ (6.3) 



r2.. 



with 0<e <d(l- and G vr^ 

For further use, we give the proofs for the covariances. 



Proof of Lemma l6.2l For any (x, y) G M'^ x Mf^, we suppose the existence of two indexes ji{x, n) 
and j2{y,n) in Z'^ such that x £ 7rj^(^._„)(=: 7r„jJ and y £ Trj2(y,n){=- T^nji)- Thus 

^ n ^ n 



k=l ^ k=l 



and 



" k=i 

71—1 n 

+ E Cov(l.„^^(X,J,l.„^^(X,J)=:K + C„. 

p=l q=p+l 
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Set pk '■= P{Xq G TTnk), k G 7>'^. The "variance term" Vn is easy to compute. 

1 " 1 
^" = m ^^^i'^^nn (^o), (^o)) = -rj {P{Xo e T^nn > ^0 e TTnja) - PjiPja) • 

" k=l " 

Since / is continuous there exists at least one point € TVnj such that . f{x)dx = hf^f{^j). 
Then if ji ^ j2, we get 

where (^jij^ja) ^ ^nji x T^nja- Otherwise if ji = j 2 = j: 

Let us turn to the "covariance term" C^. By stationarity and, since tp —tq and tp-q are equal in 
distribution, we have 

2 n—ln—r 
r=l p=l 

= T;^ E (1 - D / (^0), (X„))5*^(n)dn =: C„,i + C„,2 + C„,3, 

''"n ^ ''"^ Jo 

where 

„ n— 1 „ 

C'n,^ E (1 - -) / Cov(l,„^.^ (Xo), 1,„,^ (X„))5-(^x)d«, i = 1, 2, 3, 

with -El = (0, no), = (""Oi ^n'^^^) £'3 = 00) , for some < e < d to be specified 

later. Recall that h{u) = X^^i fl'*'^(tf), one seeks to bound each covariance subterm. First, by 
Cauchy-Schwarz inequality and Fubini's theorem, 

\Cn,i\ < ^^Var(l,„,^ (Xo)) ^Var(l,„,^ (Xq)) £° h{u)du 

Then ^l(^) and Fubini imply 

|Cn,2| < / // sup |5„(x,y)|dxdy /i(n)du < 2/tofc(4i)^n' 

where e tt^^i . 
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Now, it is clear that for n large enough we have h^'^^^ > ui. So using Davydov's inequality 
(Lemma lG.ip with mixing condition Ai{ii) and Fubini, for any (p, G ]l,/9 — l] x 
such that § + ^ = 1: 



1 + ^,00 



Finally, setting ki := 2uoho, k2 := 2/io and fcs := ^^^^p'^''^" , one has 



n. 



+ k2k{i,,)K + k3^Jf{Cj,f'h{^n)'~~- ''^ (6-4) 

We then deduce the lemma by taking ji = j2 = j with the appropriate expression of Vn- It turns 
out that the covariance is a 0(l/(n/ij![)) for any choice of e pertaining to 0,d(^l — 

Proof of Lemma 16.31 Here the calculus of Vn is exactly the same as in the proof of Lemma 
16.21 In fact, the delicate point will consist again in bounding Cn- To do so, we give the common 
probability density function, say A^, of all random variables {Zj — Zi, i < j}. Since the 
variables {Zi, < i < n} are supposed to be independent and symmetrically distributed, we 
have A2(f) = g,j*'^{t) = J^gj{t — y)gj{y)dy with support over [—(5,(5]. Let us denote by [xj the 
largest integer less than or equal to the real x, and set tq := [no/(5] and := [/i^'^^^J for some 
< e < d to be specified later. Now stationarity implies 

2 n—ln~r 
" r=l p=l 

= Id E (1 - -) / Cov{l^„^^iXo),l.„,,iXrS+t))Az{t)dt =: Cn,l + Cn,2 + Cn,3, 



Cn,^■.= ^Y.{^-^ I Cov(l^„^,^(Xo),l.„^.^(X,5+t))Az(t)dt, z = 1,2,3, 



where 

n — 

/i^ V nJ J_s 

with E'l = {1, . . . , ro}, E2 = {ro + 1, . . . , r^} and £"3 = {r^ + 1, . . . , n — l}. By Cauchy-Schwarz 

we get 

|Cn,i|<TrfEvVar(l^„^^(Xo))JVar(l^„^^(Xo)) / Az{t)dt 

" r=l -^^"^ 
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Then using Ai{i) 



2 r r 

\Cn,2\ <TdYl / / / |5r.5+t(x,y)|da;dy 

"■n , ^„j_i -/-(S J JTVnj^ XTVnj2 Ve^'' 



r=ro+l ' 



2{ri - ro)hik{i,,) j\z{t)dt 



By Davydov's inequality and Ai{ii), 

ii-i „5 



For any {p,q) G l] X 

decreasing, we have 



such that I + p = 1 ^-iid since a^x i') arithmetically 



1 — "^/"'^//^ n1/p 



< /in' 



'(ri-l)5 



<^f^;/(&)'-J/(e„)'-J*;n^i-i)'-'. 

Now ifp<p(l — ^<0) and since r/, > h~'^~^'^ — 1 we may write 

where (l - 2/i^-^) ^ 1 as n ^ oo. Hence we obtain 
4p2 



|Cn,3| < 



(/9 — 

Finally, setting := 2 \J] and fcg := one has 



, i_i . , , .1-1 , H{d-e){p-p)-d} , r,]d-e\i-- 



nhi Cov(/,„ 4) < -hifiCj,)f{Cj,) + k4^f{^j,)f{Cj,){l - Kf{Cj,m - Kfiin)) 
+ 2^4) [K-l"^] K)+hy/J^J^h^. hl'''-''''-''-'\l-2ht^rK (6.5) 
which implies the desired result. The covariance is thusaO(l/(n/i^)) for any £ in \^,d(l - 
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6.1.2 Proof of Theorem [3H 

Renewal sampling - By integrating over iTnj the right-hand side of (16. 2|) and by summing up over 
all hypercubes, we first derive an asymptotic upper bound for IV. For some e € 



0,d{l-^ 



nhi f Var(/,)dx < - hif{ij)){l + 



Then using the approximation of integral by Riemann sums, i.e., 

Y.hir{i,) = \\rh + o{l), K=l--,l,2, and Y.<K^o) = \Mi + o{l), 



one has 



nhilY{f^) <i^l + k^ + k2\\k\lhl + k4f^-\K^^'' + (6.6) 

The two parts of the theorem follow from the choice /i„ = cn~^/*-'^"*'^\ < c < oo. So Lemma 
O yields 

2 

Jimn^ ISB(/f) = |^i?,(/'), 
and combining with (|6.6I) . if p = p — 1 (e = 0), we have 

limsupn^ MISE(/f ) < ^Rdif) + -Al + ki + k^Wkh + ^3||/'"^||i| • 

\ip < p — 1 (e > 0), we improve the asymptotic constant: 

hmsupn^ MISE(/„^) < ^Rd{f) + \{l + k^}. 

Jittered sampling - Now, let us integrate over -Knj the right-hand side of (16. 3p : 
nhi I Var(/,)dx < hi\f{i,){l - hif{ij)){l + ^4} + 2k{i,) 



5 



hi 



for any e S 



. Then sum up over all indexes j to obtain 



nhi IV(/f) < {l + ^4 + 2||^|K {K-\^] hi)+k4f'~\ hl^^'-'^^''-'^''^yi + o{l)). 
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Therefore, if p = p — 1, the bin width choice /i„ = cn < c < oo, entails 

limsupn^ MISE(/f) < ^i?^/') + 1 |l + + 2||A:||i + A^sH/"^ || J . 
If p < /9 — 1, we get a better asymptotic constant: 

Hmsupn^ MISE(/^) < ^Rd{f') + \{1 + h}- ■ 

6.1.3 Variance bounds with high frequency sampUng 

The period depends now on the sample size in that 6n i as n ^ 00. We start by giving a 
new bound expression for the variance of fni^) which depends upon 70. 

Lemma 6.4 (high frequency sampling). If Ao{ii) and A[{i){ii) hold, then we obtain 

nhi Yar{fj) < - htf{^,)) + 2^.(4) E ^) ^n^n''' + {2no||/|loc/(e,) 

+ 2iui-uo + 5n)k{ij) sup T^{u) + 2k{ij) f vr(7x)d^xfl + -:5^^^4:-5„H/i^5-\ (6.7) 

with G '''"nj 'i^c^ it entails that the variance is a 0(l/(n/i^)) with the following choices 

Kilo) = di/i^l{^,<i} + da/in In (/i-'^)l{^o=i} + d3/i^/^oi|^^,>i}, < o!i,d2,o!3 < tx). 

Proof of Lemma 16.41 The calculus of Vn remains identical. Now to upper bound C„, we have 
to make use of the local assumption Ai{i). Set := [uo/Sn\ and := [iii/(5„J, since is 
stationary one may write 

^« = F ^ " D ^"^(^'^-i (^0), 1.,,, (^r^J) =: + Cn,2, 

n 

where 



^0 

2^ 
\ - r 



n ,,^1 
2 



^"-1 ^= /^E (1 - -) Cov(l.„^^(Xo),l.„,^(X,,J), 



" r=rO+l 



C--2:=^ E (l--)Cov(l.„^^(Xo),l.„,.^(X,,J). 
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First using A[{i) we get 

|Cn,i| < Td X] // 1 frs„{x,y) + |l/|loo/(2;) \dxdy 

^0 

< 2 J] / y{x){T5n)-'<' + \\f\\oof{x)]dx 

where ^ '^nji- Setting fcg := 2no||/|too, we obtain 



r'yo 

. r=l 



7r(r(5„)+ ^ 7r(r(5„ 

T-=rJ!+l r=r},+l 



Then using ^'i(m) 

|C„,2|<Trf 2^ // sup b.5„(x,y)|dxdy <2/i^A:(e;j 

"-n r=rO+l '^'""^1 ^'""^^ ?^Gl*'' 

where S^j-^ G 7r„j^. On the one hand, one has 

7r(r(5„) < (r^ - r°) sup vr(n) < (ui - uq) sup 7r(n)(H — jj^^ 

^^^0+1 «e[«o,ni] Me[no,ni] V ^1 ^0/ 

On the other hand, the monotonicity of 7r(-) imphes 

r=ri+l ■r=ri+l "^"i ^ 



+1 



Setting A;7 := 2{ui — uq) sup^gu^j „ 1 7r(n) and kg := 2 7r(n)du, we thus obtain 



\Cn,2\ < kjk{Cj,) (l + ^;^^)/in<5n' + hHCn) (l + jS^^^^'^") ^n'^n ^ 

Thence 

nhi Cov(4,4) < + 2^(4) l^l^-iy^^J-^o + A;6/(e,J/i5^5-i 

+ A:7A;(4) (l + ^I^)hi5-' + A:8A;(4) (l + ^ifL^5n)h'i5~\ (6.8) 

which leads to the desired result. Using (|6.8I) . we also deduce the optimal choices S^{'jo) of (5„ 
i.e. the smallest values of (5„ so that Cn is a 0(1). These choices are given by (|3.1|1 in accordance 
with the values of 70. ■ 
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6.1.4 Proof of Theorem [331 

By integrating over 7r„j the right-hand side of (|6.7p : 



: ^ Var (/,)dx < /i:^|/(C,)(l - hifiCj)) + 2(^(4) E ^'^n^" + ^^{^^6-' 



Ui - Uq 



tt{ui) 



6n]hi5, 



Then let us sum up over all indexes j. Since ip is Riemann-integrable, we obtain 

nhi l\{f^) < (l + 2||(^||i f E + ^^<^n' + {h + k^)\\khhi5-\l + o(l)). 



, r=l 



According to the values of 70, we derive all asymptotic bounds with optimal choices of 
- if 7o < 1, the choice ^^(70) = dih^, < di < cx), entails 



1 



if 7o = 1, the choice 5^ (70) = c?2^n (^n'^) , < ^2 < 00, entails 



1-70 



ji— 70 

^1 20^rf{l-7o) 



1 



,1-70 



. r=l 



d2 



- if 7o > 1, the choice 5^ (70) = d^hn , < da < 00, entails 

^0 



, r=l / 



70 



4°(7o-l) 



1 



70 — 1 " 

70^0 



J70-1 



So setting 



^70 ' 



l/2||^||inJ-^° 1 ^2||^||i 2||(^||i7o 
+ K6 + (A:7 + A;8)||A;||i p{^(,<i} H — l{^o=i} + 



■ dil 1 



70 



d^ilo-l) 



■1 



{7o>l}' 



it remains to choose /i„ = cn i/('^+2)^ < c < 00, as in Theorem I3.2t 



limsupn^ MISE(/f ) < ^Rd{f') + \{l + C^o), 

n^oo 1-^ C 

and we can also improve our asymptotic constant for any choice of 5n such that Sn/S^i^io) 00 
as n — > 00: ^ 

limsupn^ MISE(/f) < ^i?^/') +1. ■ 

71— *00 1-^ C 
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6.2 Frequency polygon 
6.2.1 Proof of Theorem SH 

Renewal sampling - First observe that 



/ Var(/r(:r))dx = J] / Var(/r(x))dx, 

JR • J[c,,c,-+if 



where 



/ Var(/„^^(x))dx = / (x - c.O^Var + (c,+i - x)2Var(/,) 

Jcj Jcj I 

+ 2{x - Cj){cj+i - x)Cov(/j, fj+i) \dx. 



For any j G Z, let us denote by Vj (respectively Cjj+i) an upper bound expression for 
nhn^SL'^{fj) (respectively nhnCov[fj,fj+i)) that is independent of x. We get 



nhn Var(/f ^(x))dx < y {F, + F.+i + C,j+i} 



(6.9) 



Insert now both expressions (|6.2I) and (|6.4|) in (|6.9p . then for e G 



0,1 



p-p 



Cj + l 



nhn / Var(/„ {x))dx 

+ y - hnf + + k2k{i,+lWn + ^3/(0+1)'"^ /ll"'"'^^'"''^"'^ 



+ 



- Kf{ij)f{i,+l) + fcl^/(e,)/(Ci+l)(l - /in/(^i))(l - /in/(0+l)) 



We bound the IV of /^^ by summing up over all indexes j. So for e G 



0,1 



nK lyifD < ^ + A:i + A:2||A;||i hf^ + hUf-^W^ h 



(l + o(l)). (6.10) 



Now the bin width choice hn = cn < c < oo, in Lemma UTTl yields first 



lim nt ISB(/^^) 



49 
2880 ' 
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and combining with (|6.1Q|1 . ii p = p — 1 (e = 0): 

limsupnt MISE(/^^) < J^c^i?(/") + 1 |^ + A:i + A:2||A:||i + h\\f'-^\^'j . 
If p < /)- 1 (0 < e < 1): 

limsupnt MISE(/r) < + ^ + ^i} ' 



Jittered sampling - The outlines of the proof are unchanged. Insert both expressions (|6.3p and 
(|6.5p in (16. 9p and sum up over all indexes j, then it follows that for e E 



0,1-^ 



no 
6 



nK IV(/D < |^ + A;4 + 2||A;||i 

Take hn = cnT^I'^ ^ < c < oo, then if p = p — 1: 

limsupnt MISE(/„^^) < ^c^R^f) + 1 |^ + A;4 + 2||A;||i + A:5||/'-^||,| . 
lip < p-\: 

limsupnt MISE(/r) < -^/Kf") + ^ + ' " 
6.2.2 Proof of Theorem [431 

Insert now both expressions (|6.7p and (16. 8p in (16. 9p and sum up over all indexes j, we get 
nhn lyUD <\l + n^Wi ( E 4^ 1 ^-^n^' + ^6/in<5-^ + (kj + fcg) |1 A: || i /i^-J-M (1 + o(l)). 



■r=l 



Then hn = cn < c < cxo, together with the optimal choices (5* of 6n yield 

limsupnt MISE(/D < ^^'Hf") + ^{^ + ^^o}, 
and, if 6n is such that (5n/^n(7o) — > c>o as n — > cxd, C^^ is removable in the limiting bound. ■ 
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